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The present invention relates to a video coding method of exploiting the temporal 
redundancy between successive frames in a video sequence. 

Efficiently encoding video or moving pictures relies heavily on exploiting the 
temporal redundancy between successive frames in a sequence. Based on the as- 
sumption that local motions are slow with respect to the temporal sampling pe- 
riod, several techniques have been proposed for efficiently removing this redun- 
dancy. The most successful and acclaimed method is block-based motion predic- 
tion, heavily used in nowadays standards such as MPEG4 and H.26L. Roughly 
speaking, these compression schemes predict a frame in the sequence based on the 
knowledge of previous frames. The current frame (or predicted) is cut into blocks 
of fixed size and the best matching block is searched in the reference frame. Dis- 
placement vectors are then encoded so that the decoder can reconstruct the predic- 
tion of the current frame from the previously decoded frame(s). As the block- 
based prediction is not accurate enough to encode perfectly the current frame, the 
error between the original and the predicted frame is encoded separately. This is in 
general referred to as texture coding or motion residual coding. The main draw- 
back of this method lies in the blocky nature of the prediction mechanism, which 
gives rise to very noticeable blocky artefacts at low bit rates. Moreover such a 
system, while well suited for wide translational motions, is unable to cope with 
locally complex movements or even global geometric transformations such as 
zoom or rotation. Finally, block based motion prediction is not able to follow 
natural features of images since it is stuck in a fixed framework based on artificial 
image primitives (blocks). 
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This invention proposes to solve the above-mentioned problems by introducing a 
new paradigm for dealing with spatio-temporal redundancy. 

5 The method according the invention is defined in claim 1. In the dependent claims 
various embodiments are proposed. 

The invention will be described with the help of accompanying representations. 

10 Figure 1 : shows the progressive reconstruction of an I frame with 50, 100, 150 and 
200 atoms. 

Figure 2 : shows a flow-chart of the I-frames codec, where the atoms are used to 
transmitthe-firame . 

15 Figure 3: shows a flow-chart of the I-frames codec, where the atoms are estimated 
from the coded image both at the encoder and decoder. 

Figure 4 : shows three successive schematic updates of basis functions (atoms) 
inside a sequence of frames 

20 

Figure 5 : shows a flow-chart of the P-frames coder, using atom based motion 
prediction. 

Figure 6 and 7: show the encoding of the Foreman sequence: I-frame, predicted 
25 frames #20, 50 and 99. 

Figure 8: represents PSNR along the encoded Foreman sequence as a function of 
the number of predicted frames. 
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The proposed method is composed of two main parts. La a first step, a geometric 
model of a reference frame is built. In the second step, the model is updated by 
deformations in order to match successive frames. 



A reference frame (or intra frame, I-frame) is first decomposed into a linear com- 
bination of basis functions (atoms) selected in a redundant, structured library (see . 
P. Vandergheynst and P. Frossard, Efficient image representation by anisotropic 
refinement in matching pursuit, in Proceedings of IEEE ICASSP, Salt Lake City 
UT. May 2001, vol. 3, the content of which is incorporated herein by reference) 

N-l 
n=0 



arms equation, I fay) is the Intensity of the I-frame represented as a function giv- 
ing the gray level value of the pixel at position (x,y). c„ are weighting coefficients 
and g r „ (*> y) are the atoms involved in the decomposition. These atoms are image 
panics generated by a simple mathematical formula expressing the gray level 
vahre at pixel position fay). The formula is built by applying geometric transfor- 
mations to a function gfay) that we call a generating mother function. The pa- 
rameters of these transformations are concatenated in the vector of parameters y. 
Fv^pies of possible transformations are translations, rotations or dilations. They 
act on the generating mother function by change of variable, for example: 

Translations: gb fay) = g(x-b,,y-b$ 

Diction: gafay) = d 1 g(x/a,y/a) 

3, anisotropic dilations and rotations: 
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cos&(x-bj)-sin&(y-b 2 ) 




and 7=[ai,a 2 ,bi,b2,d], are the parameters of this transformation. 

Generating mother functions are chosen almost arbitrarily. Their properties can be 



adapted to the specific application. A possible example is to select an oscillating 
function of the form: 



The decomposition can for example be accomplished using a Matching Pursuit 
(see SVMall^ 

IEEE Transactions on Signal Processing, 41(12):3397-3415, December 1993, the 
content of which is incorporated herein by reference). Matching Pursuit (MP) is a 
greedy algorithm that iteratively decomposer the image using the following 
scheme. First the atom that best matches the image is searched by maximizing the 
scalar product between the image and the dictionary(l|g r }, and a residual image 
is computed: 




I=< I |g,o)g.o +R i- 



Then the same process is applied to the residual- 



R,=(R 1 |g yi )g„+R ; 



and iteratively: 
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5 Finally this yields a decomposition of the image in terms of a sum of atoms: 
I ( x ) = li(R„|g, a >g rn (x) + R N (x). 



n=0 



The basis functions (atoms) are indexed by a string of parameters K representing 
10 geometric transformations applied to a generating mother function g(x f y). This 
index can be seen as a point on a manifold. The set of geometric transformations 
is designed in such a way that the total collection of basis functions (atoms) is a 
dense subspace of V (R 2 ) , i. e . any image can be exactly represented. 

15 This part of the method expresses the I-frame as a collection of atoms that can be 
seen as geometric features such as edges or parts of objects which are very notice- 
able by the human eye. These basic primitives hereafter referred to as atoms, form 
a primal sketch of the image. The atoms are modelled and fully represented by the 

set of coefficients and parameters {c n , r „, n = 0 N-l} , where c is the coeffi- 

20 cient and y n is a vector of parameters. 



There are two ways to handle the coding of the I-frames. The first one is to esti- 
mate ibe atoms of the original frame. The atoms modelling the I frame are then 
entr °Py ^ded and sent in the bitstream. The process of quantizing an 
atom corresponds to the quantization of its coefficient and parameters. The atoms 
are also stored in memory in order to be used for the prediction of the next frames. 
A flowchart of this procedure is shown in Figure 2, where the atoms are used to 



25 
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transmit the frame. Dotted lines represent the coefficients and parameters of the 
atoms, the bold solid line represents a video frame and the light solid line repre- 
sents bitstreams. The flow-chart should be understood as follow: The input frame 
is decomposed into atom using the MP algorithm. The atoms are then quantized 
and entropy coded. They are also de-quantized and stored in memory for the pre- 
diction of the next frames. Those de-quantized atom parameters are used to recon- 
struct the image, with the help of the generating mother function (which is known 
both by the encoder and decoder). Finally the difference between the original 
frame and the reconstructed one is computed and encoded using the frame coder. 

Note that the figure includes an optional step, that encodes the motion residuals 
(or texture), which is the difference between the original frame and the one recon- 
structed using the atoms. This encoding can be used to further increase the quality 
of the decoded image up to a lossless reconstruction. 



The second way of handling the I-frames is more conventional. The original frame 
is encoded and transmitted using any frame codec. Then the atoms are estimated 
from the reconstructed frame, both at the encoder and at the decoder. Finally those 
atoms are stored in memory for the prediction of future frames. The flowchart of 
this procedure is shown in Figure 3, where the atoms are estimated from the coded 
image both at the encoder and decoder. Dotted lines represent the coefficients and 
parameters of the atoms, the bold solid line represents a video frame and the light 
solid line represents bitstreams. The flow-chart should be understood as follow: 
The input frame is first encoded with the frame coder and send to the decoder in a 
bitstream. It is also decoded, in order to ges tee same frame that will be available 
at the decoder. This frame is decomposed into atoms by the MP algorithm. Finally 
those atoms are stored in memory to be used for the prediction of the next frames. 

The second step of the method consists in updating the image model (the set of all 
atoms) in order to take into account the geometric deformations that have occurred 
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between the reference and the current frame. Clearly, since the model is based on 
geometric transformations, updating its atoms allows for adapting to smooth local 
distortions (translations, rotations, scalings are common examples). In order to 
compute this update, we assume that the deformed model is close enough to the 
5 reference model. We thus have to search for new atom parameters in the proximity 
of the previous solution. This is performed by means of a local optimisation pro- 
cedure trying to minimize the mean square error between the updated model and 
the current frame (Figure 4, where three successive schematic updates of basis 
functions (atoms) inside a sequence of frames are represented). The updated atom 
10 parameters are then the solution of: 



where the optimization method for frame I t at time t is initialised with the atom 
parameters corresponding to the solution at timet-1 or to the reference frame (the 
I frame) in order to avoid error propagation. This problem is a non-convex, non- 
linear, differentiable optimisation problem (see Dimitri P. Bertsekas (1999) 
Nonlinear Prograrrrniing: 2nd Edition. Athena Scientific 
http://ww.amenasc.coin/nonlinbook.html, the content of which is incorporated 
herein by reference), which can be solved using various algorithms such as quasi- 
Newton methods, combined with line-search or trust-region globalisation tech- 
niques (see Conn A., Gould N. & Toint Ph. (2000) Trust Region Methods. SIAM. 
http://ww.mndp.ac.be/~phtomt/pht/frbook.html, the content of which is incorpo- 
rated herein by reference), in order to identify a local optimum. 

The difference between the original atom parameters and the updated ones is then 
computed and sent together with updated atom coefficients. Quantization and en- 
tropy coding can then be performed (P. Frossard, P. Vandergheynst and R.M, Fi- 
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gueras y Ventura, Redundancy driven a posteriori matching pursuit quantization. 
ITS Technical Report 2002, the content of which is incorporated herein by refer- 
ence ) and a bitstream generated. This procedure is shown in detail in the flow- 
chart of Figure 5. which represents a flow-chart of the P-frames coder, using atom 
based motion prediction. Dotted lines represent the coefficients and parameters of 
the atoms, the bold solid line represents a video frame and the light solid line 
represents bitstreams. The flow-chart should be understood as follow: The input 
frame are passed to the parameter estimation, which will modify the atom parame- 
ters stored in memory in order for them to describe correctly the input. The differ- 
ence between the new atom parameters and the one stored in memory is computed 
and the result is quantized and entropy coded. Those quantized difference are also 
de-quantized and added to the atom parameters previously stored in memory. This 
allows the reconstruction of the same atoms as the one available at the decoder. 
Those recohstmcted"ato mTnembry, replacing the" 

ones of the previous frame. They are also used to reconstruct the current frame 
using the generating mother function available at the encoder and decoder. The 
difference between this reconstruction and the original input is computed and en- 
coded using the frame coder. The number of updated atoms and their quantization 
can be fixed but can also be chosen in adaptive manner through rate and distortion 
constraints by means of a rate controller. For each motion predicted frame, the 
motion residuals (or texture) can be computed and encoded using any still image 
codec. It would then be sent in the bitstream in order to generate a scalable stream 
achieving lossy to lossless compression. 

Typical results of the motion prediction, with a coding cost of 170 Kbps in aver- 
age are shown in Figure 6, where predicted frames 20, 50 and 99 are represented. 
It can be seen that the motion prediction stays accurate even after 100 frames, 
even in the absence of encoding of the motion residual. This is the major advan- 
tage of this technique compared to block based compensation. In order to show 
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the advantage of the new prediction technique, we have done the same experiment 
with typical block matching compensation. The same sequence was encoded using 
adaptive block compensation with block sizes of 4x4 to 32x32. The encoder 
automatically selected the best block size according to Rate-Distortion optimisa- 
5 tion. By varying the block size it is possible to control the size of the compressed 
bitstream in order to match the result of the atom based motion prediction. Like in 
the case of the atom based prediction, the motion residual where not coded. The 
Motion prediction results of the block matching are shown in Figure 7. 
native comparison can be done using the PSNR measure. This measure is 
10 computed using me squared difference between the original and the reconstructed 



frame, i.e. PSNR = -10 log 



255 2 



Zfe-y)-/^)) 2 



where I(x,y) is the original 



15 



jo 



frame and I r (x,y) is the reconstructed frame. The PSNR comparisons of the two 
nre^nnou methods, for a bitstream of average size of 170 Kbps, are shown in 
Figure 8. The performance of the current atom based motion prediction is particu- 
larly good taken into consideration, that the states of the art block based motion 
prediction was used. Moreover, it can be seen that in the long term (more than 50 
Masses) the prediction of the atom based method is more constant. In typical block 
watching applications, the block size is limited to 8x8, which causes poorer per- 
^wmances for the Block Matching. 



Xm iwmber of predicted frames can be fixed but can also be chosen in adaptive 
manner tnrough rate and distortion constraints by means of a rate controller. For 
instance, when abrupt changes occur (shots bstwsea scenes)* the saodsJ ie sot an 
accurate base anymore and the reference I-fraroe can simply be refreshed in order 
t-ji i-rfcMVjiii' a new model. Such a refresh mechanism can be monitored by tracking 
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frame-to-frame distortion, which should stay rather constant for smooth updates of 
the initial model (Figure 8). 
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CLAIMS 



1. Video coding method of exploiting the temporal redundancy between 
successive frames in a video sequence characterized in that a reference 
frame, called I-frame, is first approximated by a collection of geometric 
features, called atoms, and that the following predicted frames called, P- 
frames, are approximated by the geometric transformations of the geomet- 
ric features (atoms) describing the previous frame. 

2. Video coding method according to claim 1, characterized in that the I- 
frame is approximated by a linear combination of N atoms g ( x , y ) : 

AM 

*(x,y) = J^c ngy (x,y), selected in a redundant, structured library and in- 

dexed by a string of parameters y B representing the geometric transforma- 
tions applied to the generating mother function g(x,y) and the c„ are 
15 weighting coefficients. 

3. Video coding method according to claim 2, characterized in that the at- 
oms occurring in the decomposition are chosen using the Matching Pursuit 
algorithm. 

4. Video coding method according to one of the claims 1 to 3, characterized 
in that the parameters and coefficients of the atoms are quantized and en- 
tropy coded. 

5 Video coding method according the claims 4, characterized in that the 
quantization of the parameters and the coefficients can vary across time, 
and that the variation is controlled by a rate control unit. 
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6. Video coding method according to one of the claims 1 to 5, characterized 
in that the system is used as a motion prediction, and that the differences 
between the original frames and the ones reconstructed using the atoms, 
called the residual images, are encoded using another frame based codec. 

7. Video coding method according to one of the claims 1 to 6, characterized 
in that the geometric features (atoms) of the I-frame are computed from 
the quantized frames at the encoder and decoder and are not transmitted. 

8. Video coding method according to one of the claims 1 to 7, characterized 
in that the geometric features (atoms) are re-computed after each quan- 
tized frame at the encoder and decoder and replace the previous prediction. 

9. Video coding method according to one of the claims 1 to 8, characterized 
in that the geometric transformations used to build the library are com- 
posed of translations, anisotropic dilations and rotations, applied to a gen- 
erating mother function gfcy) by means of the following change of vari- 
ables: 

g Y <X y) = i 1 g(*„ ,y n )> where 

_ cosS(x-b,)-sin$(y-b 2 ) 
a, 

_ sind(x-b I )+cos^(y-b 2 ) 

10. Video coding method according to one of the claims 1 to 9, characterized 
in that the generating mother function is of the following form: 
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The invention relates to a video coding method of exploiting the temporal redun- 
dancy between successive frames in a video sequence. A reference frame, called I- 
frame, is first approximated by a collection of geometric features, called atoms. 
The following predicted frames called, P-frames, are approximated by the geomet- 
ric transformations of the geometric features (atoms) describing the previous 
frame. Preferably, the I-frame is approximated by a linear combination of N at- 

omsg r {x,y): I(x,y) = Y,c ngy (x,y), selected in a redundant, structured library. 

They are indexed by a string of parameters r „ representing the geometric trans- 
formations applied to the generating mother function g(x,y) and the c n are weight- 
ing coefficients. 
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